Search CORE

19 research outputs found

Breast cancer prognosis by combinatorial analysis of gene expression data

Author: David E Axelrod
Gabriela Alexe
Irina I Lozina
Michael Reiss
Peter L Hammer
Sorin Alexe
Tibérius O Bonates
Publication venue: Springer Nature
Publication date: 01/01/2006
Field of study

INTRODUCTION: The potential of applying data analysis tools to microarray data for diagnosis and prognosis is illustrated on the recent breast cancer dataset of van 't Veer and coworkers. We re-examine that dataset using the novel technique of logical analysis of data (LAD), with the double objective of discovering patterns characteristic for cases with good or poor outcome, using them for accurate and justifiable predictions; and deriving novel information about the role of genes, the existence of special classes of cases, and other factors. METHOD: Data were analyzed using the combinatorics and optimization-based method of LAD, recently shown to provide highly accurate diagnostic and prognostic systems in cardiology, cancer proteomics, hematology, pulmonology, and other disciplines. RESULTS: LAD identified a subset of 17 of the 25,000 genes, capable of fully distinguishing between patients with poor, respectively good prognoses. An extensive list of 'patterns' or 'combinatorial biomarkers' (that is, combinations of genes and limitations on their expression levels) was generated, and 40 patterns were used to create a prognostic system, shown to have 100% and 92.9% weighted accuracy on the training and test sets, respectively. The prognostic system uses fewer genes than other methods, and has similar or better accuracy than those reported in other studies. Out of the 17 genes identified by LAD, three (respectively, five) were shown to play a significant role in determining poor (respectively, good) prognosis. Two new classes of patients (described by similar sets of covering patterns, gene expression ranges, and clinical features) were discovered. As a by-product of the study, it is shown that the training and the test sets of van 't Veer have differing characteristics. CONCLUSION: The study shows that LAD provides an accurate and fully explanatory prognostic system for breast cancer using genomic data (that is, a system that, in addition to predicting good or poor prognosis, provides an individualized explanation of the reasons for that prognosis for each patient). Moreover, the LAD model provides valuable insights into the roles of individual and combinatorial biomarkers, allows the discovery of new classes of patients, and generates a vast library of biomedical research hypotheses

Springer - Publisher Connector

PubMed Central

Breast cancer prognosis by combinatorial analysis of gene expression data

Author: A Hammer
AA Alizadeh
AI Su
AL Boulesteix
AM Jackson
B Weigelt
C Sotiriou
David E Axelrod
E Boros
E Boros
F Bertucci
G Alexe
G Alexe
G Alexe
G Alexe
G Alexe
G Getz
Gabriela Alexe
H Dai
H Yang
H Zhang
H Zhang
H Zhang
IH Witten
Irina I Lozina
J Eckstein
J Khan
J Khan
JJ Chen
L Liu
LJ van 't Veer
M Bittner
M West
MB Eisen
Michael Reiss
MJ van de Vijver
MPS Brown
MS Lauer
NS Holter
O Alter
O Alter
P Tamayo
P Toronen
Peter L Hammer
PL Hammer
PS Brown
S Abramson
S Alexe
S Alexe
S Gruvberger
S Paik
S Ramaswamy
S Ramaswamy
S Raychaudhuri
SG Hilsenbeck
Sorin Alexe
T Hastie
T Sorlie
T Sorlie
Tibérius O Bonates
TR Golub
TR Golub
TR Sutter
TS Furey
VA Kuznetsov
X Huang
Y Crama
Y Tan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

PATTERN-BASED FEATURE SELECTION IN GENOMICS AND PROTEOMICS

Author: Bela Vizvari
Gabriela Alexe
Gabriela Alexe
Peter L. Hammer
Peter L. Hammer
Sorin Alexe
Sorin Alexe
Publication venue
Publication date
Field of study

Abstract. A major difficulty in data analysis is due to the size of the datasets, which contain frequently large numbers of irrelevant or redundant variables. In particular, in some of the most rapidly developing areas of bioinformatics, e.g., genomics and proteomics, the expressions of the intensity levels of tens of thousands of genes or proteins are reported for each observation, in spite of the fact that very small subsets of these features are sufficient for distinguishing positive observations from negative ones. In this study, we describe a two-step procedure for feature selection. In a first “filtering ” stage, a relatively small subset of relevant features is identified on the basis of several combinatorial, statistical, and information-theoretical criteria. In the second stage, the importance of variables selected in the first step is evaluated based on the frequency of their participation in the set of all maximal patterns (defined as in the Logical Analysis of Data, and generated using an efficient, total-polynomial time algorithm), and low impact variables are eliminated. This step is applied iteratively, until arriving to a Pareto-optimal “support set”, which balances the conflicting criteria of simplicity and accuracy

CiteSeerX

Pattern-based clustering and attribute analysis

Author: Gabriela Alexe
Peter L. Hammer
Sorin Alexe
Publication venue
Publication date
Field of study

The Logical Analysis of Data (LAD) is a combinatorics, optimization and logic based methodology for the analysis of datasets with binary or numerical input variables, and binary outcomes. It has been established in previous studies that LAD provides a competitive classification tool comparable in efficiency with the top classification techniques available. The goal of this paper is to show that the methodology of LAD can be useful in the discovery of new classes of observations and in the analysis of attributes. After a brief description of the main concepts of LAD, two efficient combinatorial algorithms are described for the generation of all prime, respectively all spanned, patterns (rules) satisfying certain conditions. It is shown that the application of classic clustering techniques to the set of observations represented in prime pattern space leads to the identification of a subclass of, say positive, observations, which is accurately recognizable, and is sharply distinct from the observations in the opposite, say negative, class. It is also shown that the set of all spanned patterns allows the introduction of a measure of significance and of a concept of monotonicity in the set of attributes

CiteSeerX

Comprehensive vs. comprehensible classifiers in logical analysis of data

Author: Alexe Gabriela
Alexe Sorin
Hammer Peter L.
Kogan Alexander
Publication venue: Elsevier B.V.
Publication date: 15/03/2008
Field of study

AbstractThe main objective of this paper is to compare the classification accuracy provided by large, comprehensive collections of patterns (rules) derived from archives of past observations, with that provided by small, comprehensible collections of patterns. This comparison is carried out here on the basis of an empirical study, using several publicly available data sets. The results of this study show that the use of comprehensive collections allows a slight increase of classification accuracy, and that the “cost of comprehensibility” is small

Elsevier - Publisher Connector

New Tools for Spatial Intelligence Education: the X-Colony Knowledge Discovery Kit

Author: Consuela Voica
Cristian Voica
Gabriela Alexe
Sorin Alexe
Publication venue: Danubius University
Publication date
Field of study

This study introduces a new framework for developing spatial education programs based on a geometric language and manipulation of ensembles of polyhedra, called X-Colony Knowledge Discovery Kit (KDK). The KDK main goals are to develop spatial intelligence, creativity, strategic planning, forecasting skills, abstract reasoning, self-confidence, and social skills. Landmark studies document that spatial education plays a central role in driving performance in science, technology, engineering and mathematics (STEM) occupations, yet spatial education is under-studied and the infrastructure for research on spatial learning is at the beginning. KDK introduces a novel geometric language that allows visual communication and develops spatial abilities by engaging students to perform creative paper folding and various mental spatial transformations. KDK is organized in program sessions consisting of cooperative open-end paper construction activities that engage students to build modular constructions of gradual complexity and to explore various strategies for combining the constructs into novel configurations. KDK supports the Core Math Standard and Science curricula and provides students the opportunity to discover connections between mathematics, science and various other disciplines. A pilot case-control study conducted with fifth grade students indicates an average increase of 17% in geometric reasoning after 8 hours of KDK activities

Directory of Open Access Journals

Comprehensive vs. Comprehensible Classifiers in Logical Analysis of Data

Author: Alexander Kogan
Gabriela Alexe
Peter L. Hammer
Sorin Alexe
Publication venue
Publication date
Field of study

The main objective of this paper is to compare the classification accuracy provided large, comprehensive collections of patterns (rules) derived from archives of past observations, with that provided by small, comprehensible collections of patterns. This comparison is carried out here on the basis of an empirical study, using several publicly available datasets. The results of this study show that the use of comprehensive collections allows a slight increase of classification accuracy, and that the “cost of comprehensibility” is small

CiteSeerX

Consensus Algorithms for THE GENERATION OF ALL MAXIMAL BICLIQUES

Author: Bruno Simeone
Gabriela Alexe
Peter L. Hammer
Sorin Alexe
Stephan Foldes
Publication venue
Publication date
Field of study

We describe a consensus-type algorithm for determining all the maximal complete bipartite (not necessarily induced) subgraphs of a graph. We show that by imposing a particular order in which the consensus type operations should be executed, this algorithm becomes totally polynomial. By imposing a further restriction on the way the algorithm has to be executed, we derive an improved variant of it, the complexity of which is bounded by a polynomial which is cubic in the input size, and only linear in the output size, and show its high efficiency on numerous computational experiments on randomly generate

CiteSeerX

An Educational Resource for Spatial Intuition

Author: Consuela Voica
Cristian Voica
Sorin Alexe
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref